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1. INTRODUCTION 

Since the first version of simulated annealing algorithm described by [1], researchers focused on 
two strategies in order to improve the performance of SA. The first strategy was the implementation of 
parallel simulated annealing [2-4]. The second one was the optimization of cooling schedule and the 
adaptation of parameters. The cooling schedule is an important set of parameters that governs the 
convergence of SA. The set of annealing schedule as defined by [5], includes the cooling factor, the starting 
and stopping temperature, and the number of moves at each temperature. The cooling factor is the most 
influential feature among the set of annealing schedule. This factor can be defined as the method for which 
the algorithm reduces the temperature to its next value. If the temperature is reduced very quickly, a 
convergence to a local minimum may occur. However, if it is reduced too slowly, the algorithm takes a long 
time to converge. The most frequently used decrement rule is geometric schedule [6-8] in which the 
temperature decrease at each step t is governed by the formula 8, = a.6,_,, where 0.85 < a < 0.96 and ais a 
constant. Another method which outperforms the commonly used geometric cooling, was proposed by Lundy 
[9-13]. Lundy’s cooling law uses the flowing formula : 0, = Ot-1/(1 + B ®_1) where B is a suitably small 
value. 

The use of machine learning to tune heuristic was adopted by many researchers [14- 17]. Especially, 
the Hidden Markov Model (HMM) [18]. HMMs success is due to ability to deal with the variability by 
means of stochastic modeling. It was used to enhance the behavior of metaheuristics by estimating their best 
configuration [19- 25]. 

This paper presents a new approach to enhance SA, which consists of tuning the Lundy’s cooling 
law during the run, using the Hidden Markov Chain. The main idea is to predict the best cooling law 
parameter based on history of the run. To do that, first we train the HMM model by updating its parameters 
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using Baum-Welch algorithm [18]. Then we proceed to a classification process through the Viterbi algorithm 
[18] which gives the most probable cooling law parameter. The rest of this paper is organized as follows. 
Section 2 is devoted hybridization methodology of HMM, SA, then section 3 presents and discusses the 
experimental results, and finally, we conclude the paper in section 4. 


2. THE RESEARCH METHOD 

To enhance the performance of SA, an hybridization with the HMM was adopted. During the run, the 
hidden Markov model performs classification based on observable sequence generated from a set of rules. 
This sequence allows the model to guess the hidden state which can be a slow cooling helping the algorithm to 
converge to a global minimum, a medium or rapid cooling to speed up the search when no improvement in 
solution occurs (Figure 1). 


Slow Cooling Medium Cooling Rapid Cooling 


Figure 1. Markov chain for simulated annealing algorithm 


The Hidden Markov Model can be defined as 5-tuple (S, O, A,B, T?) where: 

a. S= {S1, S2, 53} is set of hidden states, which is respectively: slow, medium and rapid cooling. 

b. Sı: is SA with a slow Lundy decrease law, the cooling factor is B = 0.005. 

c. S: is the same variant of simulated annealing, where the cooling law is faster than the previous one 

B = 0.05. 

S3: is Lundy simulated annealing variant with where the rapid cooling law 6 = 0.1. 

O = (0,,02 ,...,05) is the set of the observation per state. 

f. A = (a;j) is a transition probability matrix, where aj; is the probability that the state at time t + 1 is S; , 
is given when the state at time t is S; 

g. T° = ( T}, T9, 78 ) is the initial probability, where 7t? is the probability of being in the state S;. 

h. B = (biz) is the observation probabilities, where bj, is the probability of observing O+ in state S;. This 
observations matrix B of hidden markov model is estimated at early stage by Maximum Likelihood 
Estimation (MLE). 

The main purpose of this model is to estimate state sequence S that best explains the observation 
sequence O. To generate the observable sequence of HMM model. We use a progression rate described in 

Equation (1), and a measure of the acceptance rate of the proposed solution described in Equation (2). 


pe 


p = (Number of proposal)/(Inner loop x Outer loop) (1) 


Where in Equation (1), the number of proposal is the number of solution generated by the neighborhood 
function in each iteration, inner and outer loop are the maximum number of iterations established for SA to 
find the best solution. 


w, = (Number of accepted solutions)/(Number of proposal) (2) 


In Equation (2), the number of accepted solutions at iteration t is the accumulated number of accepted 
solution until the current iteration; and like the Eq. 1, number of proposal is the number of solution generated 
during the search. The acceptance rate w,, and the progression rate p are then used to generate a sequence of 
class from a set of rules as follow: 

a. 04: little decrease of acceptance rate. 
02: no improvement in cost function even if the progression rate is less than 50%. 
O3 : a great decrease of acceptance rate. 
O, : a little increase of acceptance rate. 
O; : a huge increase of acceptance rate. 


ones 
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During the run a set of observation is generated as follow: 


Algorithm 1: Generate_Observation 
Input: We P, Rule_1(w,p), Rule_2(w,p), Rule_3(w,p), b bRule_4(w,p), Rule_5(w,p) 
Output: O Current Observation 
If Rule_1(wW,p)==TRUE then O<1 End 
If Rule 2 (w;,,p)==TRUE then Oe2 End 
If Rule 3(w;,,p)==TRUE then 0<3 End 
If Rule 4(w,,p)==TRUE then O4 End 
If Rule 5(w,,p)==TRUE then 0<-5 End 
Return O 


The purpose of this model is to estimate state S that best explains the observation sequence O. Given 
the observation sequence O = 0, 0; ... Or and a model A = (A, B, m). Firstly, we estimate the transition and 
emission probabilities from the first sequence of observation using a supervised training. In which, we count 
frequencies of transmissions and emissions of the model: 


Algorithm 2: MLE 
Input: 0 = (0, 0; ... Or) 
Output: A= (a;j),B =(bi) 


For i = 1 to T-1 do 26 ,0i4, = %0;0i4, t1 End 
For i = 1 toT do boo; = boio + 1 End 
For i=1to3 do A; = Yj.ayj and Bi = Dja bij End 
Fori=1to 3do 
For j=lto 3 do aj = a;;/Aj End 
For t=1 toTdo by = bi,/B; End 
End 
Return A 


Then we use the Viterbi to select the corresponding state sequence Q = q1q2 ... qr that best explains 
observations, secondly, the Baum Welch adjusts the model parameters 2 = (A,B,7) to maximize 
P(O|A), i.e., the probability of the observation sequence given the model. 


2.1. Viterbi Algorithm 
After model parameters definition, the Viterbi algorithm is used to build HMM classification 
process. This algorithm is used to compute the most probable path as well as its probability. 


Algorithm 3: Viterbi 
Input: S=(51,52,53), O=(0,02..0r), A = (aj), B = (bir) , T° = (n9, 19,19 ) 
Output: s* = (sj,53,53) the most probable sequence of states 
For i = 1to3 do 6,(i)=b,(0,)m; and y,(i)=0 End {Initialization} 
For t = 2 to T do 
For j=l to 3 do 
6G) = maxi, [5-1 (aijb;(o;,)] and ,(j/) = argmax}., [6-1 aij] 


End 
End 
Pmax = Maxj-, [6r(D] ; Sio = argmax}_,[57(O] 
For t=T-1 to 1 do sj = Wrai(Siu1) End 
Return s* 


2.2. Baum Welch Algorithm 


The Baum—Welch algorithm is used to adjust the parameters of HMM. This training step is based on 
Forward-Backward algorithm. 


2.2.1. Forward Algorithm 

The first algorithm used by the Baum-Welch algorithm is the Forward algorithm. This algorithm 
returns the forward variable a,(t) defined as the probability of the partial observation sequence until time t, 
with state Sj at time t, a;(t)=P(0,0, 0040 = S;|a), and we define P(O|A) as the probability of the 
observation sequence given the model À. 


Algorithm 4: Forward 
Input: S=(s1,S2,S3),O=( 04 0,..07),A = (aij) , B =( biz), T° = (Tf, 79,19 ) 
Output: a = (@0)) , P(O|A) 
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For i=1 to 3 do a;(i)=7;b;,, End 
For t=1toT-1 do 
For j=1to3 do a4,(/) = (Vii a: @aj)djey. End 
End 
P(O|A) = Yd ar) 
Return a,P(O|A) 


2.2.2. Backward Algorithm 
The second algorithm used by Baum-Welch Backward. This algorithm calculates the backward 
variable B;(t) defined as the probability of the partial observation sequence after time t, given state S;: 


Bi(t) = P(Oe410¢42 - Orlde = Si, A) 


Algorithm 5: Backward 
Input : S=(s;,52,53),O=( 0; Oz ... Or), A=(a;j), B=( bi), T° = (T9, 18,78 ) 
Output : B= £,(i) :the probability of the partial observation sequence 
For i=1to3 do fr(i)=1 End 
For t=T—1 to 1 do 
For i=1 to3 do B,(i) = Yes aijBraiG)bjta1 End 
End 
Return f 


The Baum-Welch is then used to re-estimate the parameters of the model A, which maximizes the 
probability of the observation sequence. This algorithm is described as follow: 


Algorithm 6: Baum-Welch 
Input: S=( s1 52,53), 0=(0, 0; ...O7), A= (aij), B=(b;,), 2° = ( a}, 73,73) „B =P, a= (aCi), PCO|A) 
Output: (A) = (@j),B = (bir) 

Repeat 

[a ,P(O|A)] -Forward(0,A,B,m°) ; B-Backward(O, A, B, T°) 
For t=1 to T do 
For i=l to 3 do 

at(DaijbjBt+1 Q) 


For j=1 to 3 do ¢&(ij)=— ow End 
nO = Xj- $y O 
End 
End 
For k=1 to T do 
For i=1 to 3 do 
- ro T P 
For j=1 to 3 do me yQ) ; ay ee ; Dix ee End 
End me t=1Vt 
End 
While (P(O|A) increase) 
Return A,B 


2.3. The Hybridization of HMM and SA 

In the following we will implement a variant simulated annealing based on hidden Markov models. 
The interest behind hybridization the simulated annealing with the HMM is to improve the SA’s 
performance. 


Algorithm 7: HMM-SA algorithm 

Data: The objective function f 

Initialization O:Empty observation sequence, initial temperature, 6;:final 
temperature, x & X):starting point, cmp« 1, p:progression rate, Wg:acceptance rate, 
ne 0:temperature stage 


Repeat 
Repeat 
Xnew & Xx+u {u is a Random vector from the uniform distribution over [0,1)} 
If fnew) -f(x) <0 then xe xy, 
else Generate a pseudo-random number u* {u* €[0,1)} 


If u* < exp(— new PO) then x©eXney End 
n 
End 
Until equmbrium is approached sufficiently closely at 60, 


Update (P, Wy) 
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0, < Generate-Observation(W, p, Rule,, Rule,, Rule, Rule,, Rules) 
Ifcmp < 10 then 
0 - [0,0] ; [A,B] —MLE(O) ; stateeViterbi(0,A,B) ; cmp <cmp+1 ; 
else 
O e [On 0n] ; [A,B] <—Baum-Welch(0,A,B) ; stateViterbi (0,A,B) 
End 
Cooling law Hetate 
On+1 & Cooling law(@,) 
Until 6,,, < 9, indicating that the system is frozen 


3. EXPERIMENT 

The experiment was designed to measure the effects of hybridization of HMM and SA and to show 
how our approach can improve the solution quality, we have chosen five benchmark functions selected from 
the literature (Table 1). 


Table 1. Benchmark functions 


Name Function Formula 
x4 
Six-Hump Camel fix) = (« — 21x? + 2) x2 + X1x + (—4 + 4x2)x2 
Levy N° 13 fa (x) = sin? (3nx1) + (x, — 1) [1 + sin? (3nx3)] + (x2 — 1) [1 + sin? (27rx2)] 
D D 2 
Quadric fh (x) = 3 D x) 
i=1 i=1 
D 
Tablet f(x) = 10°x? +) x? 
i=2 
D 
Sphere f(x) = » x? 
i=1 


The proposed hybridization of SA algorithm and HMM was coded in Scilab programming language 
and experiments were conducted on a PC with an Intel Core 17-5500U 2.40 GHz (4 CPUs) and 8 GB of 
RAM. The hybridization of SA and HMM have been tested using the benchmark functions presented above. 
Each function was tested over 30 trials. We eliminated the effects of other factors which play an important 
role in the performance of algorithm, by choosing the same starting points for all methods (in each run) and 
their location was chosen to be far from basins of attraction of global minima. Also, we have chosen the same 
initial acceptance probability and an identical length of the inner and outer loops. The initial temperature, 09, 
have been calculated from mean energy rises Af during the initialization. Before the start of the SA, the mean 
value of cost rises is estimated by a constant number of moves equal to 100. Then, initial temperature 8g is 
calculated using the following formula 8) = ae [26], where P) is the initial average probability of 
acceptance and is taken equal to 0.95. The length T of observed sequence was chosen equal to 10. 


3.1. Numerical Results 

The computational results and statistical analyses are summarized in Table 2. It provides the details 
of the results for the test functions. The overall best solution of the total 30 replications is shown in bold. 
HMM-SA provides the best solution for the test functions f,,..,f5. In general HMM-SA algorithm 
overcomes the classical variants in all benchmark functions. 


Table 2. Results comparisons between HMM-SA and the classical SA 


Functions HMM-SA CSA 
“f,obest ”~*~<ts~*sé‘s™*sS OB QEHOOD  -1.031E+00 
mean -1.032E+00 -1.031E+00 
fo best 3.768E-06 1.371E-05 
mean 7.864E-02 6.944E-02 
fs best 2.013E-07 6.671E-06 
mean 6.181E-06 4.000E-03 
fa best 9.903E-07 7.112E-06 
mean 6.643E-05 1.432E-03 
fs best 4.311E-09 8.745E-06 
mean 4.662E-06 1.155E-03 
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3.2. Comparison of Convergence Performance 

To obtain further insights into the convergence behavior our approach, HMM-SA method was 
compared to the classical SA. Experiments were designed to measure the effects of the hybridization of SA 
and HMM presented in the previous section. It was noticed that the HMM-SA can converge rapidly to global 
minimum. The time gained in early stage can be used to converge to a better solution. This behavior is 
depicted in Figure 2. 


SA 
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Figure 2. Comparison of HMM-SA and the classical SA 


3.3. Statistical Analysis 

We performed a Mann—Whitney—Wilcoxon (MWW) test [27] to determine whether the algorithm 
reach a significant performance. We choose this statistical test because we have two heuristics to compare. 
The Mann—Whitney—Wilcoxon test compares whether there is any difference from two algorithms. The null 
hypothesis Hp says that the two algorithms have the same means (Hp : 44 = H2 ) and the alternative 
hypothesis H, says that two algorithms have a different means (H; : 44 # Uz). According to table 3, for 
functions fi, f3, fa, fs, the p-value is less than the significance level of a = 0.05. We can reject the null 
hypothesis, so we can conclude that our hybridization of HMM and SA outperforms the classical instance of 
SA. 


Table 3. Statistical analysis for benchmark functions 


4. CONCLUSION 

In this study, we proposed a self-tuning capability of simulated annealing based on Hidden Markov 
Model. To test the performance of this approach, it was applied to a number of benchmark functions selected 
from literature. This approach allows to controls the cooling of SA during the run, based on sequence of state 
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generated from a set of rules. The HMM parameters are calculated and updated at each cooling step. The 
Viterbi algorithm is then used to classify the observed sequence. The comparisons of the proposed approach 
and the classical simulated annealing demonstrate that the simulated annealing based on HMM classifier is 
able to find better solutions in reasonable time. Our approach is able to manage time by rapidly decreasing 
temperature and thus anticipating exploitation state, this lead to a better convergence. Future research may be 
compared to SA with fuzzy logic controllers and the application of our method to some optimization 
problems should be pursued. 
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